Cache Pipelining with Partial Operand Knowledge
نویسندگان
چکیده
Caches consume a significant amount of power in modern microprocessors while also constraining clock frequency due to their access time. In this paper, we propose a bit-sliced cache, which reduces dynamic power consumption and achieves higher clock frequency as well as increased cache throughput while adding little complexity. Our bit-sliced cache reduces 20-40% of dynamic power for a variety of cache organizations by activating only the necessary row decoders and subarrays. To reduce cycle time, the cache access is pipelined, which results in higher bandwidth without suffering from the complexity and power and area penalty caused by an additional cache port. We report cycle time improvements nearly proportional to the degree of bit-slice pipelining, as well as performance improvements averaging 9% and 11% for an out-of-order processor with a 2-sliced and 4-sliced cache and ALU.
منابع مشابه
Multipldpass Pipelining: Enhancing In-order Microarchitectures to Out-of-order Performance
Out-of-program-order execution has become almost a ubiquitous characteristic of modern processors because of its ability to tolerate variable memory-instruction latency. As designs are becoming increasingly power-conscious, the cost and complexity of the components of out-of-order execution are becoming problematic. Compilers have generally proven adept at planning useful static instruction-lev...
متن کاملOne Address Computers are Faster and Use Less Memory Space to Execute Arithmetic Assignment Statements
A notation is developed which permits space and time efficiemcy comparisons of four basic computer architectures in use today for executing Fortran-style assignment statements. From the com~arisona, we discover that a suitably designed l-address archit.cture (one accumulator machine) outperforms toe other architectures in speed of execution and in encoded size ~ compiled Fortran statements. The...
متن کاملStallscope: Illuminating the Black Box
As microprocessors become increasingly more complex, cycle-accurate simulation has become a valuable tool for performance analysis and microarchitectural exploration. However, parallelism, complex interdependencies, and deep pipelining in modern superscalar processors make it difficult to identify how a particular microarchitectural design feature ultimately affects performance, particularly in...
متن کاملA New Multiplier Using Wallace Structure and Carry Select Adder with Pipelining
Design of a high performance and high-density multiplier is presented. This multiplier is constructed by using the Wallace tree structure with pipelining. A fast carry select adder is used for the final two-operand adder. It is shown that the time delay for the entire multiplier is O(log(n)). The design is particularly carried out for a 32-bit multiplier with two sections of pipelining, to bala...
متن کاملHigh Throughput Power-Aware FIR Filter Design Based on Fine-Grain Pipelining Multipliers and Adders
In regular FIR structure, by pipelining the multipliers one can improve the throughput. But as the growth of operand word length, the delay in addition process becomes another important constraint. In this paper, a novel fine-grain pipelining scheme for high throughput FIR is proposed. By pipelining multipliers and adders, very high throughput can be achieved. 2-Dimensional pipeline gating tech...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004